Re-run Sage 2.1.0 #69

mattwthompson · 2025-10-21T04:30:16Z

$ cp submissions/2024-11-07-Sage-2.1.0/yds.yaml submissions/2025-10-20-Sage-2.1.0/input.yaml

Submission Checklist

Created a new directory in the submissions directory containing the YDS input file and optionally a force field .offxml file
Triggered the benchmark workflow with a PR comment of the form /run-optimization-benchmarks path/to/submission/input.yaml or /run-torsion-benchmarks path/to/submission/input.yaml
Waited for the workflow to finish and a comment with Job status: success to be posted
Reviewed the results committed by the workflow
Published the corresponding Zenodo entry and retrieved the DOI
Added the Zenodo DOI to the table in the main README
Ready to merge!

mattwthompson · 2025-10-21T04:30:40Z

/run-optimization-benchmarks submissions/2025-10-20-Sage-2.1.0/input.yaml

github-actions · 2025-10-21T04:34:30Z

A workflow has been dispatched to run the benchmarks for this PR.

Run ID: 18672934811
Triggering actor: github-actions[bot]
Target branch: rerun-sage-2.1.0

github-actions · 2025-10-21T09:07:20Z

A workflow dispatched to run optimization benchmarks for this PR has just finished.

Run ID: [18672934811]
Triggering actor: github-actions[bot]
Target branch: rerun-sage-2.1.0
Job status: success
[18672934811]: https://github.com/openforcefield/yammbs-dataset-submission/actions/runs/18672934811

mattwthompson · 2025-10-21T15:10:00Z

I'm seeing some differences between this run and submissions/2024-11-07-Sage-2.1.0. They nearly don't show up visually, but do come through in the statistics.

Summary statistics for rmsd differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           64474.000000           64474.000000  64474.000000
mean                0.281187               0.281247     -0.000060
std                 0.274567               0.275336      0.086656
min                 0.000000               0.000000     -2.855092
25%                 0.118493               0.118378     -0.005638
50%                 0.191289               0.191362      0.000000
75%                 0.337260               0.336936      0.005735
max                 3.406551               4.095735      2.903419
Out of 64474 entries:
23740 entries have (absolute) difference greater than 0.01
5438 entries have (absolute) difference greater than 0.05
2331 entries have (absolute) difference greater than 0.1
364 entries have (absolute) difference greater than 0.5
101 entries have (absolute) difference greater than 1.0
Summary statistics for dde differences:
       2024-11-07-Sage-2.1.0  2025-10-20-Sage-2.1.0      abs_diff
count           54653.000000           54653.000000  5.465300e+04
mean               -0.745372              -0.745930  5.575600e-04
std                 3.399927               3.400385  4.248813e-01
min              -102.205111            -102.190539 -1.893245e+01
25%                -2.011216              -2.007744 -1.273394e-02
50%                -0.378216              -0.376148  3.304079e-12
75%                 0.800524               0.799714  1.300617e-02
max                96.402958              96.378667  1.524444e+01
Out of 64006 entries:
5504 entries have (absolute) difference greater than 0.1
1354 entries have (absolute) difference greater than 0.5
706 entries have (absolute) difference greater than 1.0
82 entries have (absolute) difference greater than 5.0

Both RMSD results on the same plot:

Distribution of RMSD differences:

Both DDE results on the same plot:

Distribution of DDE differences:

Here's the code I used to generate these plots and statistics, which I ran from this branch

lilyminium · 2025-10-22T07:18:57Z

From a quick look the differences here look probably fine, and I think (caveated with the note in the next sentence) we should merge it so Chapin has a more up-to-date comparison for the protein FF benchmarks. The only note I have is that you'd ideally want to be using the unconstrained version.

I'll leave some of my working below since this was a quick-and-dirty skim. A more rigorous check would actually compare the geometries between the two runs, instead of (here) comparing the difference from QM.

I visually checked the molecules with the highest RMSDs differences, which are long and floppy. While we expect that flexible molecules can sometimes slide into a different minimum with minor differences in optimization steps, affecting the torsions, I'd expect bonds and angles to remain relatively inflexible. Bond ICRMSD differences range up to 0.005 at the worst. The majority are very low in magnitude. The outlier points around 0.2 remain outlier points.

If I had more than 5 min I'd be curious which bond/s are contributing to the molecules with the highest differences in bond ICRMSD between runs (highest bond RMSD shown below) and look at geometries.

Same goes for angles, differences range up to 1 degree difference.

Again if I had more time I'd wonder what's going on with this seemingly uncomplicated molecule.

About 1.2k conformers had the exact same RMSD in both runs, leading me to think they probably minimized to the same structure. Checking the ddEs for these conformers (~750 of which didn't have nan energies) showed some differences ranging -0.4 to 0.4 (kcal/mol?). It's likely this comes from the conformer minimum being different in geometry, although hard to guarantee without checking. 452 conformers had ddE differences < 1e-6 though, and 601 < 1e-3, which seems reasonable.

mattwthompson · 2025-10-22T14:03:33Z

That's a lot for 5 minutes! I've done much less in much more time this morning.

Here's a subset of environment differences, none of which should be a smoking gun:

{'openeye-toolkits': ('2024.1.3', '2025.1.1'),
 'openff-amber-ff-ports': ('0.0.4', '2025.09.0'),
 'openff-forcefields': ('2024.09.0', '2025.10.0'),
 'openff-interchange': ('0.4.0', '0.4.8'),
 'openff-interchange-base': ('0.4.0', '0.4.8'),
 'openff-qcsubmit': ('0.53.0', '0.57.0'),
 'openff-toolkit': ('0.16.5', '0.17.1'),
 'openff-toolkit-base': ('0.16.5', '0.17.1'),
 'openff-units': ('0.2.2', '0.3.1'),
 'openff-utilities': ('0.1.12', '0.1.16'),
 'openmm': ('8.1.2', '8.3.1'),
 'rdkit': ('2024.03.5', '2025.03.6')}

The automation does a conda env export which is good for tracking what was run but doesn't make it easy to use that environment, especially on a different platform. It might be easier to use a tool like conda lock instead

I've pulled out a few molecules which get high ICRMSD differences but it's hard to draw conclusions quickly. I will pick this up later.

Otherwise I've

Update the docs to remind users (including myself) to use unconstrained force fields
Added the conda environment to the committed summary results; it being more accessible would be handy here
Spun out Add descriptive title to Zenodo upload #70 because it's another barrier to analysis
Made Use conda-lock when "exporting" environments #71 for reasons described above

mattwthompson · 2025-10-22T21:55:49Z

For now I will follow your recommendation to keep this moving along. There's lots we can do to make these analyses easier; the barrier here was higher than I hoped it to be

mattwthompson · 2025-11-11T20:32:36Z

Not sure where to put this analysis, but for now I'll just infodump on a few molecules I looked at

Key take-aways

Sulfonamides show up frequently in the worst RMSDs - both QM vs. MM and run-to-run MM
Most of the YDS run-to-run variance can be explained by "random" conformational changes
Nitrogens serving like hinges between planar groups also show up some

To get another picture of the sort of differences in these YDS runs, here's a .describe() of a table showing only the differences between valence RMSDs (units are A and degree):

statistic	Bond_diff	Angle_diff	Dihedral_diff	Improper_diff
count	64474.0	64474.0	64468.0	64254.0
null_count	0.0	0.0	6.0	220.0
mean	0.000043	0.008894	0.399945	0.083824
std	0.000077	0.023497	1.55017	0.348042
min	0.0	0.0	0.0	0.0
25%	0.00001	0.001616	0.031341	0.007912
50%	0.000026	0.004241	0.106133	0.025028
75%	0.000053	0.008939	0.279522	0.065661
max	0.004998	0.906306	41.369588	11.990329

I'm somewhat (?) reassured that the order of magnitude of bond differences is quite small and even the top quartile is not so bad. The order of magnitude of angle differences also seems good, even the maximum value. Torsion differences are more suspect in the middle and very suspect on the high end - so far, these seem to be where minimizing to a different conformer shows up quantitatively.

Generally suspicious molecules

36966572

This molecule is just ... sorta cursed. I don't know if phosphates truly like to comprise part of a (7-membered) ring but the 3-D structure is quite poor, mostly at the phosphate group. Here are bonds labeled by how much bonds diff (between QM and MM) when the difference is more than 0.02 Angstroms:

This molecule has the worst bond RMSD in (both) 2.1.0 runs and was the top of discussion in one of Chapin's bug reports.

This is not necessarily relevant to the reproducibility of YDS runs, but notable.

mattwthompson · 2025-11-11T20:48:18Z

Molecules with suspiciously different bond RMSDs

Key take-aways:

The two worst offenders had significant conformational change
S-N bond lengths (but not S-O or N-R) in sulfonamides are bad and probably drive the numerical differences
Non-aromatic rings like to flop about (okay, not breaking news ...)

37015581

This is the first one you looked at above, it has the biggest difference in bond ICRMSDs between YDS runs. The molecule doesn't look particularly exotic, but there's an N+ in the middle that's causing some issues, multiple bonds > 0.02 A from QM.

2024:

2025:

The two YDS runs minimized to different conformers - last year's run wasn't too far away from QM (0.2 A RMSD) but this year's run was a bit more (0.6 A). This year's run minimized it to a different conformer; the (non-planar in QM) ring that the N+ is a part of underwent an inversion that cause the other ring (the one with one OH, not two OHs) to rotate a little bit. DDEs are similar (1.919324 before vs. 1.929937 now) suggesting to me that the new conformer it found is okay. With some magic it might be possible to match these "new" conformers to existing ones, but I didn't do that.

43427452

This is the second-biggest run-to-run difference in bond lengths and is also due to finding a different conformer. The old results aren't great (1.0 A RMSD vs QM and several unhappy bonds):

but you can see that the aromatic ring hanging off of the cursed non-aromatic ring goes from one location (2024):

all the way to a separate one:

probably doing some pi-pi stacking on those aromatic rings (ddE decreased by about 10 kcal/mol).

36971754

This is the third-worst difference in bond RMSDs. There isn't a huge structural difference visually, just a bit of rotation of the 5-membered ring and two hydrogens on the on N on the other side of the molecule. Here's the "bad bonds" image for the 2025 run:

The 2024 run is about the same except the S-N bond is -0.0671. I'm guessing that there's a delicate balance between the proton/nitrogen geometry and electrostatics that an optimizer might not always agree with itself on. This is hand-wavy but both S-N bonds are bad and I'm not so alarmed that they're somewhat differently-bad.

36960186

This is the last run-to-run bond RMSD diff above 0.003. There's definitely a conformational change of the blob of non-aromatic rings (closer to the viewer in 3D, on the left in 2D), maybe a bit of them flopping around but definitely the trivalent nitrogen cet

2024 (0.3804 A RMSD overall vs. QM):

2025 (1.2883 A):

I notice that the S-N bond is quantitatively suspect in both, but not in agreement run-to-run.

37008583

The least exciting of them so far. The 3D structure is not so bad; QM-vs-MM overall RMSDs are 0.4585 and 0.4494 so I won't dwell on the 3D structure. The story is similar to before: S-N bonds are bad and not numerically consistent run-to-run:

2024:

2025:

Curious how the S-N bonds in the ring are nearly identical but the ones on the right is not.

mattwthompson · 2025-11-12T23:10:07Z

Here's the notebook I've been using - in its current state, tidied up only partially, and note production-quality work: Compare.ipynb.txt

Re-run Sage 2.1.0

345da0c

Add benchmark results

d75fa6f

Add DOI to README

7ca7b6f

mattwthompson marked this pull request as ready for review October 21, 2025 15:12

mattwthompson merged commit 33ffe7c into main Oct 22, 2025

mattwthompson mentioned this pull request Oct 22, 2025

Re-run Sage 2.1.0 data #68

Closed

lilyminium mentioned this pull request Nov 13, 2025

Differences between old and new stack #82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-run Sage 2.1.0 #69

Re-run Sage 2.1.0 #69

Uh oh!

mattwthompson commented Oct 21, 2025 •

edited

Loading

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

lilyminium commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Nov 11, 2025 •

edited

Loading

Uh oh!

mattwthompson commented Nov 11, 2025

Uh oh!

mattwthompson commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-run Sage 2.1.0 #69

Re-run Sage 2.1.0 #69

Uh oh!

Conversation

mattwthompson commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Submission Checklist

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

mattwthompson commented Oct 21, 2025

Uh oh!

lilyminium commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Oct 22, 2025

Uh oh!

mattwthompson commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generally suspicious molecules

36966572

Uh oh!

mattwthompson commented Nov 11, 2025

Molecules with suspiciously different bond RMSDs

37015581

43427452

36971754

36960186

37008583

Uh oh!

mattwthompson commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattwthompson commented Oct 21, 2025 •

edited

Loading

mattwthompson commented Nov 11, 2025 •

edited

Loading